Time-Based Reward Shaping in Real-Time Strategy Games

نویسندگان

  • Martin Midtgaard
  • Lars Vinther
  • Jeppe R. Christiansen
  • Allan M. Christensen
  • Yifeng Zeng
چکیده

Real-Time Strategy (RTS) is a challenging domain for AI, since it involves not only a large state space, but also dynamic actions that agents execute concurrently. This problem cannot be optimally solved through general Q-learning techniques, so we propose a solution using a Semi Markov Decision Process (SMDP). We present a time-based reward shaping technique, TRS, to speed up the learning process in reinforcement learning. Especially, we show that our technique preserves the solution optimality for some SMDP problems. We evaluate the performance of our method in the Spring game Balanced Annihilation, and provide some benchmarks showing the performance of our approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Opponent Modeling in Real-Time Strategy Games using Bayesian Networks

Opponent modeling is a key challenge in Real-Time Strategy (RTS) games as the environment is adversarial in these games, and the player cannot predict the future actions of her opponent. Additionally, the environment is partially observable due to the fog of war. In this paper, we propose an opponent model which is robust to the observation noise existing due to the fog of war. In order to cope...

متن کامل

Applying Reinforcement Learning to RTS Games

Real-Time Strategy (RTS) games are challenging domains for AI, since it involves not only a large state space, but also dynamic actions that agents execute concurrently. This problem cannot be optimally solved through general Qlearning techniques, so we propose a solution using a Semi Markov Decision Process (SMDP). We present a time-based reward shaping technique, TRS, to speed up the learning...

متن کامل

Transition Entropy in Partially Observable Markov Decision Processes

This paper proposes a new heuristic algorithm suitable for real-time applications using partially observable Markov decision processes (POMDP). The algorithm is based in a reward shaping strategy which includes entropy information in the reward structure of a fully observable Markov decision process (MDP). This strategy, as illustrated by the presented results, exhibits near-optimal performance...

متن کامل

Enhancing Video Games Policy Based on Least-Squares Continuous Action Policy Iteration: Case Study on StarCraft Brood War and Glest RTS Games and the 8 Queens Board Game

With the rapid advent of video games recently and the increasing numbers of players and gamers, only a tough game with high policy, actions, and tactics survives. How the game responds to opponent actions is the key issue of popular games.Many algorithms were proposed to solve this problem such as Least-Squares Policy Iteration (LSPI) and State-Action-Reward-State-Action (SARSA) but they mainly...

متن کامل

Efficiently Solving Joint Activity Based Security Games

Despite recent successful real-world deployments of Stackelberg Security Games (SSGs), scale-up remains a fundamental challenge in this field. The latest techniques do not scale-up to domains where multiple defenders must coordinate time-dependent joint activities. To address this challenge, this paper presents two branch-and-price algorithms for solving SSGs, SMARTO and SMARTH , with three nov...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010